Today we will make the Per Capita covid cases worldwide from the article Coronavirus Map: Tracking the Global Outbreak that looks like the following -

per capita world cases

# For downloading the map zipped shapefiles
import zipfile
import requests
from io import BytesIO

# For the usual plotting and reading shapefiles
import altair as alt
import pandas as pd
import geopandas as gpd

We will use the JHU CSSE Dataset for the cases as well as the population. For the map we will use the shapefiles from Natural Earth.

population_uri = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv'
population_data = pd.read_csv(population_uri)

latest_cases_uri = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/10-29-2020.csv'
latest_cases = pd.read_csv(latest_cases_uri)

world_shapefile_uri = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip"
request = requests.get(world_shapefile_uri)
file_ = zipfile.ZipFile(BytesIO(request.content))
file_.extractall()
world_shapefile = 'ne_10m_admin_0_countries.shp'
world = gpd.read_file(world_shapefile)

We have to make some changes so that we get things right. For example there's quite a few names to change for the countries because the cases dataset doesn't have identifier informations. Then in the map file we have to merge a few countries and segregate a few based on how JHU CSSE reports their cases.

In the Map

  • Somalia is combination of Somalia and Somaliland // NYT shows them together(combined)
  • Greenland is separate from Denmark // latestcases shows them together but NYT shows them separately
latest_cases.head()
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
0 NaN NaN NaN Afghanistan 2020-10-30 04:24:49 33.93911 67.709953 41268 1532 34239 5497.0 Afghanistan 106.010169 3.712319
1 NaN NaN NaN Albania 2020-10-30 04:24:49 41.15330 20.168300 20315 499 11007 8809.0 Albania 705.921190 2.456313
2 NaN NaN NaN Algeria 2020-10-30 04:24:49 28.03390 1.659600 57332 1949 39635 15748.0 Algeria 130.742614 3.399498
3 NaN NaN NaN Andorra 2020-10-30 04:24:49 42.50630 1.521800 4567 73 3260 1234.0 Andorra 5910.826377 1.598423
4 NaN NaN NaN Angola 2020-10-30 04:24:49 -11.20270 17.873900 10269 275 3736 6258.0 Angola 31.244801 2.677963
latest_cases[latest_cases['Country_Region'].str.contains('Denmark')]
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
174 NaN NaN Faroe Islands Denmark 2020-10-30 04:24:49 61.8926 -6.9118 494 0 479 15.0 Faroe Islands, Denmark 1010.948532 0.000000
175 NaN NaN Greenland Denmark 2020-10-30 04:24:49 71.7069 -42.6043 17 0 16 1.0 Greenland, Denmark 29.944339 0.000000
176 NaN NaN NaN Denmark 2020-10-30 04:24:49 56.2639 9.5018 44034 716 33601 9717.0 Denmark 760.228880 1.626016
latest_cases.loc[latest_cases['Province_State']=='Greenland', 'Country_Region'] = "Greenland"

population_data.loc[population_data['Province_State']=='Greenland', 'Country_Region'] = "Greenland"
population_data.loc[population_data['Province_State']=='Greenland', 'Combined_Key'] = "Greenland"
latest_cases = latest_cases.drop(['FIPS', 'Admin2', 'Province_State', 'Last_Update', 'Lat', 'Long_', 'Combined_Key', 'Incidence_Rate', 'Case-Fatality_Ratio'], axis=1)
latest_cases = latest_cases.groupby('Country_Region').aggregate({'Confirmed': 'sum', 'Recovered': 'sum', 'Deaths': 'sum', 'Active': 'sum', })
latest_cases = latest_cases.reset_index()
latest_cases.head()
Country_Region Confirmed Recovered Deaths Active
0 Afghanistan 41268 34239 1532 5497.0
1 Albania 20315 11007 499 8809.0
2 Algeria 57332 39635 1949 15748.0
3 Andorra 4567 3260 73 1234.0
4 Angola 10269 3736 275 6258.0
world = world[~(world['CONTINENT']=='Antarctica')]
world = world[['SOVEREIGNT', 'ADMIN', 'NAME', 'POP_EST', 'POP_YEAR', 'ISO_A3', 'CONTINENT', 'geometry']]
world.head()
SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry
0 Indonesia Indonesia Indonesia 260580739 2017 IDN Asia MULTIPOLYGON (((117.70361 4.16341, 117.70361 4...
1 Malaysia Malaysia Malaysia 31381992 2017 MYS Asia MULTIPOLYGON (((117.70361 4.16341, 117.69711 4...
2 Chile Chile Chile 17789267 2017 CHL South America MULTIPOLYGON (((-69.51009 -17.50659, -69.50611...
3 Bolivia Bolivia Bolivia 11138234 2017 BOL South America POLYGON ((-69.51009 -17.50659, -69.51009 -17.5...
4 Peru Peru Peru 31036656 2017 PER South America MULTIPOLYGON (((-69.51009 -17.50659, -69.63832...
alt.Chart(world).mark_geoshape(stroke='white').encode().project('equalEarth')
world[world['NAME'].str.contains('Green')]
SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry
174 Denmark Greenland Greenland 57713 2017 GRL North America MULTIPOLYGON (((-40.87580 65.09650, -40.85367 ...
somalia = world[world['NAME'].str.contains('Somali')]
somalia = somalia.dissolve(by='CONTINENT').reset_index()
somalia
CONTINENT geometry SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3
0 Africa POLYGON ((46.46696 6.53829, 46.48805 6.55864, ... Somalia Somalia Somalia 7531386 2017 SOM
world=pd.concat([world, somalia])
alt.Chart(somalia).mark_geoshape().encode()

Making sure that names are same

world.loc[world['ADMIN'].str.contains('eSwatini'), 'ADMIN'] = 'Eswatini'
world.loc[world['ADMIN'].str.contains('Palestine'), 'ADMIN'] = 'West Bank and Gaza'
world.loc[world['ADMIN'].str.contains('Republic of Serbia'), 'ADMIN'] = 'Serbia'
world.loc[world['ADMIN'].str.contains('United Republic of Tanzania'), 'ADMIN'] = 'Tanzania'
world.loc[world['ADMIN'].str.contains('São Tomé and Principe'), 'ADMIN'] = 'Sao Tome and Principe'
latest_cases.loc[latest_cases['Country_Region']=='Korea, South', 'Country_Region'] = 'South Korea'
latest_cases.loc[latest_cases['Country_Region']=="Cote d'Ivoire", 'Country_Region'] = 'Ivory Coast'
latest_cases.loc[latest_cases['Country_Region']=='Timor-Leste', 'Country_Region'] = 'East Timor'
latest_cases.loc[latest_cases['Country_Region']=='Taiwan*', 'Country_Region'] = 'Taiwan'
latest_cases.loc[latest_cases['Country_Region']=='Burma', 'Country_Region'] = 'Myanmar'
latest_cases.loc[latest_cases['Country_Region']=='US', 'Country_Region'] = 'United States of America'
latest_cases.loc[latest_cases['Country_Region']=='Czech Republic', 'Country_Region'] = 'Czechia'
latest_cases.loc[latest_cases['Country_Region']=='North Macedonia', 'Country_Region'] = 'Macedonia'
latest_cases.loc[latest_cases['Country_Region']=='Bahamas', 'Country_Region'] = 'The Bahamas'
latest_cases.loc[latest_cases['Country_Region']=='Congo (Kinshasa)', 'Country_Region'] = 'Democratic Republic of the Congo'
latest_cases.loc[latest_cases['Country_Region']=='Congo (Brazzaville)', 'Country_Region'] = 'Republic of the Congo'

We will ignore this for now as they are ships/cruises (NYT does however show them as aggregates of corresponding countries)

latest_cases[latest_cases['Country_Region'].isin(world['ADMIN']) == False]
Country_Region Confirmed Recovered Deaths Active
48 Diamond Princess 712 659 13 40.0
76 Holy See 27 15 0 12.0
105 MS Zaandam 9 0 2 7.0

Extracting population data for countries -

population_data = population_data.drop(['UID', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_'], axis=1)
population_data = population_data[population_data['Country_Region'] == population_data['Combined_Key']]
population_data = population_data.reset_index(drop=True)
population_data.head()
iso2 iso3 Country_Region Combined_Key Population
0 AF AFG Afghanistan Afghanistan 38928341.0
1 AL ALB Albania Albania 2877800.0
2 DZ DZA Algeria Algeria 43851043.0
3 AD AND Andorra Andorra 77265.0
4 AO AGO Angola Angola 32866268.0
population_data.loc[population_data['Country_Region']=='Taiwan*', 'Country_Region'] = 'Taiwan'
population_data.loc[population_data['Country_Region']=='Korea, South', 'Country_Region'] = 'South Korea'
population_data.loc[population_data['Country_Region']=="Cote d'Ivoire", 'Country_Region'] = 'Ivory Coast'
population_data.loc[population_data['Country_Region']=='Timor-Leste', 'Country_Region'] = 'East Timor'
population_data.loc[population_data['Country_Region']=='US', 'Country_Region'] = 'United States of America'
population_data.loc[population_data['Country_Region']=='Czech Republic', 'Country_Region'] = 'Czechia'
population_data.loc[population_data['Country_Region']=='Burma', 'Country_Region'] = 'Myanmar'
population_data.loc[population_data['Country_Region']=='North Macedonia', 'Country_Region'] = 'Macedonia'
population_data.loc[population_data['Country_Region']=='Bahamas', 'Country_Region'] = 'The Bahamas'
population_data.loc[population_data['Country_Region']=='Congo (Kinshasa)', 'Country_Region'] = 'Democratic Republic of the Congo'
population_data.loc[population_data['Country_Region']=='Congo (Brazzaville)', 'Country_Region'] = 'Republic of the Congo'
world.columns = ['SOVEREIGNT', 'Country_Region', 'NAME', 'POP_EST',	'POP_YEAR',	'ISO_A3', 'CONTINENT', 'geometry']
world = world.merge(latest_cases, on='Country_Region', how='left')
world = world.merge(population_data, on='Country_Region', how='left')
world['per_capita'] = world['Confirmed']/world['Population']
world.head()
SOVEREIGNT Country_Region NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry Confirmed Recovered Deaths Active iso2 iso3 Combined_Key Population per_capita
0 Indonesia Indonesia Indonesia 260580739 2017 IDN Asia MULTIPOLYGON (((117.70361 4.16341, 117.70361 4... 404048.0 329778.0 13701.0 60569.0 ID IDN Indonesia 273523621.0 0.001477
1 Malaysia Malaysia Malaysia 31381992 2017 MYS Asia MULTIPOLYGON (((117.70361 4.16341, 117.69711 4... 30090.0 19757.0 246.0 10087.0 MY MYS Malaysia 32365998.0 0.000930
2 Chile Chile Chile 17789267 2017 CHL South America MULTIPOLYGON (((-69.51009 -17.50659, -69.50611... 507050.0 483922.0 14118.0 9011.0 CL CHL Chile 19116209.0 0.026525
3 Bolivia Bolivia Bolivia 11138234 2017 BOL South America POLYGON ((-69.51009 -17.50659, -69.51009 -17.5... 141484.0 110759.0 8705.0 22020.0 BO BOL Bolivia 11673029.0 0.012121
4 Peru Peru Peru 31036656 2017 PER South America MULTIPOLYGON (((-69.51009 -17.50659, -69.63832... 894928.0 819717.0 34315.0 40896.0 PE PER Peru 32971846.0 0.027142
world['code'] = world['per_capita'].apply(lambda x: 'Less than 1 in 1000' if x <= (1/1000) else 'Less than 1 in 500' if x<= (1/500) else 'Less than 1 in 333' if x<= (1/333) else 'No Cases reported' if pd.isnull(x) else 'Greater than 1 in 333')
world['Share of Population'] = world['Population']/world['Confirmed']
world['Share of Population'] = world['Share of Population'].round()
world['Share of Population'] = world['Share of Population'].apply(lambda x: f"1 in {str(x).split('.')[0]}")
alt.Chart(world).mark_geoshape(stroke='white').transform_filter(alt.datum.Country_Region != 'Antarctica').encode(
    color=alt.Color('code:N', scale=alt.Scale(domain=['No Cases reported', 'Less than 1 in 1000', 'Less than 1 in 500', 'Less than 1 in 333', 'Greater than 1 in 333'], range=['lightgrey', '#f2df91', '#ffae43', '#ff6e0b', '#ce0a05']),legend=alt.Legend(title=None, orient='top', labelBaseline='middle', symbolType='square', columnPadding=20, labelFontSize=15, gridAlign='each', symbolSize=200)),
    tooltip = ['Country_Region', 'Confirmed', 'Share of Population']
).properties(width=1400, height=800).project('equalEarth').configure_view(strokeWidth=0)